WordCloud of sleep text scores¶
In [1]:
%%bash
sudo apt update
sudo apt install fonts-ipaexfont # for Japanese in wordcloud
Get:1 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease [3,626 B] Get:2 http://security.ubuntu.com/ubuntu jammy-security InRelease [129 kB] Get:3 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64 InRelease [1,581 B] Get:4 https://r2u.stat.illinois.edu/ubuntu jammy InRelease [6,555 B] Hit:5 http://archive.ubuntu.com/ubuntu jammy InRelease Get:6 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ Packages [61.7 kB] Get:7 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [128 kB] Get:8 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64 Packages [1,199 kB] Hit:9 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease Hit:10 https://ppa.launchpadcontent.net/graphics-drivers/ppa/ubuntu jammy InRelease Hit:11 https://ppa.launchpadcontent.net/ubuntugis/ppa/ubuntu jammy InRelease Get:12 https://r2u.stat.illinois.edu/ubuntu jammy/main amd64 Packages [2,631 kB] Get:13 http://archive.ubuntu.com/ubuntu jammy-backports InRelease [127 kB] Get:14 http://security.ubuntu.com/ubuntu jammy-security/restricted amd64 Packages [3,513 kB] Get:15 http://security.ubuntu.com/ubuntu jammy-security/main amd64 Packages [2,554 kB] Get:16 http://security.ubuntu.com/ubuntu jammy-security/universe amd64 Packages [1,226 kB] Get:17 http://archive.ubuntu.com/ubuntu jammy-updates/universe amd64 Packages [1,517 kB] Get:18 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages [2,854 kB] Get:19 http://archive.ubuntu.com/ubuntu jammy-updates/restricted amd64 Packages [3,652 kB] Fetched 19.6 MB in 5s (4,022 kB/s) Reading package lists... Building dependency tree... Reading state information... 52 packages can be upgraded. Run 'apt list --upgradable' to see them. Reading package lists... Building dependency tree... Reading state information... The following additional packages will be installed: fonts-ipaexfont-gothic fonts-ipaexfont-mincho The following NEW packages will be installed: fonts-ipaexfont fonts-ipaexfont-gothic fonts-ipaexfont-mincho 0 upgraded, 3 newly installed, 0 to remove and 52 not upgraded. Need to get 7,954 kB of archives. After this operation, 14.1 MB of additional disk space will be used. Get:1 http://archive.ubuntu.com/ubuntu jammy/universe amd64 fonts-ipaexfont-gothic all 00401-3ubuntu1 [3,341 kB] Get:2 http://archive.ubuntu.com/ubuntu jammy/universe amd64 fonts-ipaexfont-mincho all 00401-3ubuntu1 [4,604 kB] Get:3 http://archive.ubuntu.com/ubuntu jammy/universe amd64 fonts-ipaexfont all 00401-3ubuntu1 [8,428 B] Fetched 7,954 kB in 2s (4,636 kB/s) Selecting previously unselected package fonts-ipaexfont-gothic. (Reading database ... (Reading database ... 5% (Reading database ... 10% (Reading database ... 15% (Reading database ... 20% (Reading database ... 25% (Reading database ... 30% (Reading database ... 35% (Reading database ... 40% (Reading database ... 45% (Reading database ... 50% (Reading database ... 55% (Reading database ... 60% (Reading database ... 65% (Reading database ... 70% (Reading database ... 75% (Reading database ... 80% (Reading database ... 85% (Reading database ... 90% (Reading database ... 95% (Reading database ... 100% (Reading database ... 123632 files and directories currently installed.) Preparing to unpack .../fonts-ipaexfont-gothic_00401-3ubuntu1_all.deb ... Unpacking fonts-ipaexfont-gothic (00401-3ubuntu1) ... Selecting previously unselected package fonts-ipaexfont-mincho. Preparing to unpack .../fonts-ipaexfont-mincho_00401-3ubuntu1_all.deb ... Unpacking fonts-ipaexfont-mincho (00401-3ubuntu1) ... Selecting previously unselected package fonts-ipaexfont. Preparing to unpack .../fonts-ipaexfont_00401-3ubuntu1_all.deb ... Unpacking fonts-ipaexfont (00401-3ubuntu1) ... Setting up fonts-ipaexfont-mincho (00401-3ubuntu1) ... update-alternatives: using /usr/share/fonts/opentype/ipaexfont-mincho/ipaexm.ttf to provide /usr/share/fonts/truetype/fonts-japanese-mincho.ttf (fonts-japanese-mincho.ttf) in auto mode Setting up fonts-ipaexfont-gothic (00401-3ubuntu1) ... update-alternatives: using /usr/share/fonts/opentype/ipaexfont-gothic/ipaexg.ttf to provide /usr/share/fonts/truetype/fonts-japanese-gothic.ttf (fonts-japanese-gothic.ttf) in auto mode Setting up fonts-ipaexfont (00401-3ubuntu1) ... Processing triggers for fontconfig (2.13.1-4.2ubuntu5) ...
WARNING: apt does not have a stable CLI interface. Use with caution in scripts. W: Skipping acquire of configured file 'main/source/Sources' as repository 'https://r2u.stat.illinois.edu/ubuntu jammy InRelease' does not seem to provide it (sources.list entry misspelt?) WARNING: apt does not have a stable CLI interface. Use with caution in scripts. debconf: unable to initialize frontend: Dialog debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debconf/FrontEnd/Dialog.pm line 78, <> line 3.) debconf: falling back to frontend: Readline debconf: unable to initialize frontend: Readline debconf: (This frontend requires a controlling tty.) debconf: falling back to frontend: Teletype dpkg-preconfigure: unable to re-open stdin:
In [2]:
!pip install wordcloud
!pip install japanize-matplotlib # for Japanese in matplotlib graph
Requirement already satisfied: wordcloud in /usr/local/lib/python3.10/dist-packages (1.9.4)
Requirement already satisfied: numpy>=1.6.1 in /usr/local/lib/python3.10/dist-packages (from wordcloud) (1.26.4)
Requirement already satisfied: pillow in /usr/local/lib/python3.10/dist-packages (from wordcloud) (11.0.0)
Requirement already satisfied: matplotlib in /usr/local/lib/python3.10/dist-packages (from wordcloud) (3.8.0)
Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->wordcloud) (1.3.1)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib->wordcloud) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->wordcloud) (4.55.3)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->wordcloud) (1.4.7)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->wordcloud) (24.2)
Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->wordcloud) (3.2.0)
Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.10/dist-packages (from matplotlib->wordcloud) (2.8.2)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.7->matplotlib->wordcloud) (1.17.0)
Collecting japanize-matplotlib
Downloading japanize-matplotlib-1.1.3.tar.gz (4.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.1/4.1 MB 47.2 MB/s eta 0:00:00
Preparing metadata (setup.py) ... done
Requirement already satisfied: matplotlib in /usr/local/lib/python3.10/dist-packages (from japanize-matplotlib) (3.8.0)
Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->japanize-matplotlib) (1.3.1)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib->japanize-matplotlib) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->japanize-matplotlib) (4.55.3)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->japanize-matplotlib) (1.4.7)
Requirement already satisfied: numpy<2,>=1.21 in /usr/local/lib/python3.10/dist-packages (from matplotlib->japanize-matplotlib) (1.26.4)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->japanize-matplotlib) (24.2)
Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->japanize-matplotlib) (11.0.0)
Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->japanize-matplotlib) (3.2.0)
Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.10/dist-packages (from matplotlib->japanize-matplotlib) (2.8.2)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.7->matplotlib->japanize-matplotlib) (1.17.0)
Building wheels for collected packages: japanize-matplotlib
Building wheel for japanize-matplotlib (setup.py) ... done
Created wheel for japanize-matplotlib: filename=japanize_matplotlib-1.1.3-py3-none-any.whl size=4120257 sha256=bb9a98e59f51b2826d6709f8f6d1b1b9fd0772e6476dbd92570e78346c268d8c
Stored in directory: /root/.cache/pip/wheels/61/7a/6b/df1f79be9c59862525070e157e62b08eab8ece27c1b68fbb94
Successfully built japanize-matplotlib
Installing collected packages: japanize-matplotlib
Successfully installed japanize-matplotlib-1.1.3
Import libraries¶
In [3]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import japanize_matplotlib # for Japanese in matplotlib graph
from wordcloud import WordCloud, STOPWORDS
Setup working directory¶
In [4]:
from google.colab import drive
drive.mount('/content/drive')
%cd /content/drive/MyDrive/Documents/ds2024/dsF1/
Mounted at /content/drive /content/drive/MyDrive/Documents/ds2024/dsF1
Parameters¶
In [5]:
csv_in = 'sleep-text-score-wakati.csv'
Read CSV file¶
In [6]:
df = pd.read_csv(csv_in, sep=',', skiprows=0, header=0)
print(df.shape)
print(df.info())
display(df.head())
(426, 4) <class 'pandas.core.frame.DataFrame'> RangeIndex: 426 entries, 0 to 425 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 text 426 non-null object 1 GPT-4o 426 non-null int64 2 Gemini-1.5-Pro 426 non-null int64 3 Claude-3.5-Sonnet 426 non-null int64 dtypes: int64(3), object(1) memory usage: 13.4+ KB None
| text | GPT-4o | Gemini-1.5-Pro | Claude-3.5-Sonnet | |
|---|---|---|---|---|
| 0 | 就寝 時間 毎日 一定 する | 2 | 2 | 2 |
| 1 | 朝日 積極的 浴びる | 2 | 2 | 2 |
| 2 | 寝室 温度 18 -22度 保つ | 2 | 2 | 2 |
| 3 | 就寝 前 ストレッチ 体 リラックス さ せる | 2 | 2 | 2 |
| 4 | 寝具 定期的 清潔 保つ | 2 | 2 | 2 |
Check the number of documents in each category¶
In [7]:
print(df['GPT-4o'].value_counts().sort_index(ascending=True))
GPT-4o 0 164 1 61 2 201 Name: count, dtype: int64
Generating WordCloud¶
In [8]:
fpath = "/usr/share/fonts/opentype/ipaexfont-gothic/ipaexg.ttf"
In [11]:
sorted_labels = sorted(df['GPT-4o'].unique())
for label in sorted_labels:
text_data = df[df['GPT-4o'] == label]['text'].str.cat(sep=' ')
wc = WordCloud(width=800, height=400, background_color='white',
font_path=fpath).generate(text_data)
plt.figure(figsize=(10, 5))
plt.imshow(wc)
plt.axis('off')
plt.title(f'Word Cloud for Label: {label}')
plt.show()
In [13]:
excluded_words = set(['寝る', '前', '直前', 'する', '就寝', '寝室'])
sorted_labels = sorted(df['GPT-4o'].unique())
for label in sorted_labels:
text_data = df[df['GPT-4o'] == label]['text'].str.cat(sep=' ')
wc = WordCloud(width=800, height=400, background_color='white',
font_path=fpath, stopwords=STOPWORDS.union(excluded_words)).generate(text_data)
plt.figure(figsize=(10, 5))
plt.imshow(wc, interpolation='bilinear')
plt.axis('off')
plt.title(f'Word Cloud for Label: {label}')
plt.show()
In [15]:
excluded_words = set(['寝る', '前', '直前', 'する', '就寝', '寝室', '夜', '見る'])
sorted_labels = sorted(df['GPT-4o'].unique())
for label in sorted_labels:
text_data = df[df['GPT-4o'] == label]['text'].str.cat(sep=' ')
wc = WordCloud(width=800, height=400, background_color='white',
font_path=fpath, stopwords=STOPWORDS.union(excluded_words)).generate(text_data)
plt.figure(figsize=(10, 5))
plt.imshow(wc, interpolation='bilinear')
plt.axis('off')
plt.title(f'Word Cloud for Label: {label}')
plt.show()
In [16]:
excluded_words = set(['寝る', '前', '直前', 'する', '就寝', '寝室', '夜', '見る', '軽い', '楽しむ'])
sorted_labels = sorted(df['GPT-4o'].unique())
for label in sorted_labels:
text_data = df[df['GPT-4o'] == label]['text'].str.cat(sep=' ')
wc = WordCloud(width=800, height=400, background_color='white',
font_path=fpath, stopwords=STOPWORDS.union(excluded_words)).generate(text_data)
plt.figure(figsize=(10, 5))
plt.imshow(wc, interpolation='bilinear')
plt.axis('off')
plt.title(f'Word Cloud for Label: {label}')
plt.show()
In [ ]: